Search CORE

214 research outputs found

Margin-based Ranking and an Equivalence between AdaBoost and RankBoost

Author: Rudin Cynthia
Schapire Robert E.
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2009
Field of study

We study boosting algorithms for learning to rank. We give a general margin-based bound for ranking based on covering numbers for the hypothesis space. Our bound suggests that algorithms that maximize the ranking margin will generalize well. We then describe a new algorithm, smooth margin ranking, that precisely converges to a maximum ranking-margin solution. The algorithm is a modification of RankBoost, analogous to “approximate coordinate ascent boosting.” Finally, we prove that AdaBoost and RankBoost are equally good for the problems of bipartite ranking and classification in terms of their asymptotic behavior on the training set. Under natural conditions, AdaBoost achieves an area under the ROC curve that is equally as good as RankBoost’s; furthermore, RankBoost, when given a specific intercept, achieves a misclassification error that is as good as AdaBoost’s. This may help to explain the empirical observations made by Cortes andMohri, and Caruana and Niculescu-Mizil, about the excellent performance of AdaBoost as a bipartite ranking algorithm, as measured by the area under the ROC curve

CiteSeerX

DSpace@MIT

Towards Minimax Online Learning with Unknown Time Horizon

Author: Luo Haipeng
Schapire Robert E.
Publication venue
Publication date: 06/10/2013
Field of study

We consider online learning when the time horizon is unknown. We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some known distribution, and the other which allows the adversary full control over the horizon. For the random horizon setting with restricted losses, we derive a fully optimal minimax algorithm. And for the adversarial horizon setting, we prove a nontrivial lower bound which shows that the adversary obtains strictly more power than when the horizon is fixed and known. Based on the minimax solution of the random horizon setting, we then propose a new adaptive algorithm which "pretends" that the horizon is drawn from a distribution from a special family, but no matter how the actual horizon is chosen, the worst-case regret is of the optimal rate. Furthermore, our algorithm can be combined and applied in many ways, for instance, to online convex optimization, follow the perturbed leader, exponential weights algorithm and first order bounds. Experiments show that our algorithm outperforms many other existing algorithms in an online linear optimization setting

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

The Rate of Convergence of AdaBoost

Author: Mukherjee Indraneel
Rudin Cynthia
Schapire Robert E.
Publication venue
Publication date: 29/06/2011
Field of study

The AdaBoost algorithm was designed to combine many "weak" hypotheses that perform slightly better than random guessing into a "strong" hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the "exponential loss." Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are finite. Our first result shows that at iteration

t

, the exponential loss of AdaBoost's computed parameter vector will be at most

\epsilon

more than that of any parameter vector of

\ell_1

-norm bounded by

B

in a number of rounds that is at most a polynomial in

B

and

1/\epsilon

. We also provide lower bounds showing that a polynomial dependence on these parameters is necessary. Our second result is that within

C/\epsilon

iterations, AdaBoost achieves a value of the exponential loss that is at most

\epsilon

more than the best possible value, where

C

depends on the dataset. We show that this dependence of the rate on

\epsilon

is optimal up to constant factors, i.e., at least

\Omega(1/\epsilon)

rounds are necessary to achieve within

\epsilon

of the optimal exponential loss.Comment: A preliminary version will appear in COLT 201

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Generalization bounds for averaged classifiers

Author: Freund Yoav
Mansour Yishay
Schapire Robert E.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2004
Field of study

We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, that is, the hypothesis that minimizes the training error, our algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction of an algorithm that predicts with the best hypothesis. By allowing the algorithm to abstain from predicting on some examples, we show that the predictions it makes when it does not abstain are very reliable. Finally, we show that the probability that the algorithm abstains is comparable to the generalization error of the best hypothesis in the class.Comment: Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000005

arXiv.org e-Print Archive

CiteSeerX

Crossref